Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up

نویسندگان

  • Ekhlas Sonu
  • Prashant Doshi
چکیده

Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration. In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others’ actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Point Based Value Iteration for Interactive POMDPs

We develop a point based method for solving finitely nested interactive POMDPs approximately. Analogously to point based value iteration (PBVI) in POMDPs, we maintain a set of belief points and form value functions composed of those value vectors that are optimal at these points. However, as we focus on multiagent settings, the beliefs are nested and computation of the value vectors relies on p...

متن کامل

Generalized and Bounded Policy Iteration for Interactive POMDPs

Policy iteration algorithms for solving partially observable Markov decision processes (POMDP) offer the benefits of quicker convergence and the ability to operate directly on the policy, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iter...

متن کامل

Approximate Solutions of Interactive POMDPs Using Point Based Value Iteration

We develop a point based method for solving finitely nested interactive POMDPs approximately. Analogously to point based value iteration (PBVI) in POMDPs, we maintain a set of belief points and form value functions composed of only those value vectors that are optimal at these points. However, as we focus on multiagent settings, the beliefs are nested and the computation of the value vectors re...

متن کامل

Anytime Point Based Approximations for Interactive POMDPs

Partially observable Markov decision processes (POMDPs) have been largely accepted as a rich-framework for planning and control problems. In settings where multiple agents interact POMDPs prove to be inadequate. The interactive partially observable Markov decision process (I-POMDP) is a new paradigm that extends POMDPs to multiagent settings. The added complexity of this model due to the modeli...

متن کامل

Improved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)

We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012